NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Approximate Nearest Neighbors Beyond Space Partitions

Andoni, Alexandr; Nikolov, Aleksandar; Razenshteyn, Ilya; Waingarten, Erik (January 2021, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA))

Full Text Available
Approximate Nearest Neighbors Beyond Space Partitions

https://doi.org/10.1137/1.9781611976465.72

Andoni, Alexandr; Nikolov, Aleksandar; Razenshteyn, Ilya P.; Waingarten, Erik (January 2021, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA))

Full Text Available
Learning Space Partitions for Nearest Neighbor Search

Dong, Yihe; Indyk, Piotr; Razenshteyn, Ilya P; Wagner, Tal (May 2020, ICLR)

Full Text Available
Scalable nearest neighbor search for optimal transport

Backurs, Arturs; Dong, Yihe; Indyk, Piotr; Razenshteyn, Ilya; Wagner, Tal (June 2020, ICML)

Full Text Available
Non-Adaptive Adaptive Sampling on Turnstile Streams

https://doi.org/10.1145/3357713.3384331

Mahabadi, Sepideh; Razenshteyn, Ilya Razenshteyn; Woodruff, David P.; Zhou, Samson (January 2020, STOC)

Full Text Available
Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering

https://doi.org/10.1145/3313276.3316350

Makarychev, Konstantin; Makarychev, Yury; Razenshteyn, Ilya (June 2019, Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering)

Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε) under a projection onto a random O(log(k /ε) / ε2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean k-clustering with the distances raised to the p-th power for any constant p. For k-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for k-medians, it answers a question raised by Kannan.
more » « less
Full Text Available
Adversarial examples from computational constraints

Bubeck, Sebastien; Lee, Yin Tat; Price, Eric; Razenshteyn, Ilya (January 2019, Proceedings of Machine Learning Research)

Why are classifiers in high dimension vulnerable to “adversarial” perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give two particular classification tasks where learning a robust classifier is computationally intractable. More precisely we construct two binary classifications task in high dimensional space which are (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (nonrobustly) by a simple linear separator, (iii) yet are not efficiently robustly learnable, even for small perturbations. Specifically, for the first task hardness holds for any efficient algorithm in the statistical query (SQ) model, while for the second task we rule out any efficient algorithm under a cryptographic assumption. These examples give an exponential separation between classical learning and robust learning in the statistical query model or under a cryptographic assumption. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms.
more » « less
Full Text Available
Nonlinear dimension reduction via outer Bi-Lipschitz extensions

https://doi.org/10.1145/3188745.3188828

Mahabadi, Sepideh; Makarychev, Konstantin; Makarychev, Yury; Razenshteyn, Ilya (June 2018, Nonlinear dimension reduction via outer Bi-Lipschitz extensions)

We introduce and study the notion of *an outer bi-Lipschitz extension* of a map between Euclidean spaces. The notion is a natural analogue of the notion of *a Lipschitz extension* of a Lipschitz map. We show that for every map f there exists an outer bi-Lipschitz extension f′ whose distortion is greater than that of f by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem for outer bi-Lipschitz extensions. We also study outer bi-Lipschitz extensions of near-isometric maps and show upper and lower bounds for them. Then, we present applications of our results to prioritized and terminal dimension reduction problems, described next. We prove a *prioritized* variant of the Johnson–Lindenstrauss lemma: given a set of points X⊂ ℝd of size N and a permutation (”priority ranking”) of X, there exists an embedding f of X into ℝO(logN) with distortion O(loglogN) such that the point of rank j has only O(log3 + ε j) non-zero coordinates – more specifically, all but the first O(log3+ε j) coordinates are equal to 0; the distortion of f restricted to the first j points (according to the ranking) is at most O(loglogj). The result makes a progress towards answering an open question by Elkin, Filtser, and Neiman about prioritized dimension reductions. We prove that given a set X of N points in ℜd, there exists a *terminal* dimension reduction embedding of ℝd into ℝd′, where d′ = O(logN/ε4), which preserves distances ||x−y|| between points x∈ X and y ∈ ℝd, up to a multiplicative factor of 1 ± ε. This improves a recent result by Elkin, Filtser, and Neiman. The dimension reductions that we obtain are nonlinear, and this nonlinearity is necessary.
more » « less
Full Text Available
Practical Data-Dependent Metric Compression with Provable Guarantees

Indyk, Piotr; Razenshteyn, Ilya P.; Wagner, Tal (January 2017, Annual Conference on Neural Information Processing Systems)

We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d log(d B/epsilon) +log n) bits per point from which one can approximate the distances up to a factor of 1 + epsilon. Our algorithm almost matches the recent bound of Indyk et al, 2017} while being much simpler. We compare our algorithm to Product Quantization (PQ) (Jegou et al, 2011) a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT, MNIST, New York City taxi time series and a synthetic one-dimensional data set embedded in a high-dimensional space. Our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance.
more » « less
Full Text Available

Search for: All records